Superstore Dataset - Exploratory and Descriptive Analysis¶

In this notebook, I carry out an in-depth exploratory and descriptive analysis of the Superstore Dataset, a widely used dataset for understanding sales performance, customer behavior, and profitability based on various transactional and demographic attributes.

This phase of analysis is essential for uncovering sales trends, identifying key performance drivers, and gaining intuition about the dataset’s structure before applying any forecasting or optimization procedures. I examine the distribution of key numerical and categorical variables, investigate relationships between product features, customer segments, and geographical regions with sales and profit levels, and use visualizations to summarize insights. Particular focus is placed on sales and profit disparities across product categories, customer segments, geographical regions, and **shipping modes, helping lay a solid foundation for downstream business intelligence and stratision-making.

1. Import Libraries¶

The analysis begins by importing essential Python libraries for data handling, numerical computation, visualization, and directory management:

pandas: For efficient manipulation, filtering, and aggregation of tabular data.

numpy: Provides support for large, multi-dimensional arrays and mathematical functions.

os: Facilitates interaction with the operating system, particularly for path management.

plotly.express: A high-level API for creating interactive, publication-quality visualizations.

plotly.io: Used for configuring Plotly renderers and saving figures.

scipy.stats: Provides statistical functions for hypothesis testing.

statsmodels.api & statsmodels.formula.api: For statistical modeling, including ANOVA.

seaborn: For creating informative and attractive statistical graphics, especially heatmaps.

matplotlib.pyplot: A plotting library used for saving seaborn plots.

Define and Create Directory Paths¶

To ensure reproducibility and organized storage, I programmatically create directories if they don't already exist for:

  • raw data
  • processed data
  • results
  • documentation

These directories will store intermediate and final outputs for reproducibility.

Loading the Cleaned Dataset¶

I load the cleaned version of the Superstore Dataset from the processed data directory into a Pandas DataFrame. The head(10) function shows the first ten records, giving a glimpse into the data columns such as Order ID, Product Name, Sales, etc.

Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City ... Product Name Sales Quantity Discount Profit Returned Person Shipping Duration Order Year Order Month
0 1 CA-2016-152156 2016-11-08 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... Bush Somerset Collection Bookcase 261.9600 2 0.00 41.9136 No Cassandra Brandow 3 2016 11
1 2 CA-2016-152156 2016-11-08 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... Hon Deluxe Fabric Upholstered Stacking Chairs,... 731.9400 3 0.00 219.5820 No Cassandra Brandow 3 2016 11
2 3 CA-2016-138688 2016-06-12 2016-06-16 Second Class DV-13045 Darrin Van Huff Corporate United States Los Angeles ... Self-Adhesive Address Labels for Typewriters b... 14.6200 2 0.00 6.8714 No Anna Andreadi 4 2016 6
3 4 US-2015-108966 2015-10-11 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... Bretford CR4500 Series Slim Rectangular Table 957.5775 5 0.45 -383.0310 No Cassandra Brandow 7 2015 10
4 5 US-2015-108966 2015-10-11 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... Eldon Fold 'N Roll Cart System 22.3680 2 0.20 2.5164 No Cassandra Brandow 7 2015 10
5 6 CA-2014-115812 2014-06-09 2014-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles ... Eldon Expressions Wood and Plastic Desk Access... 48.8600 7 0.00 14.1694 No Anna Andreadi 5 2014 6
6 7 CA-2014-115812 2014-06-09 2014-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles ... Newell 322 7.2800 4 0.00 1.9656 No Anna Andreadi 5 2014 6
7 8 CA-2014-115812 2014-06-09 2014-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles ... Mitel 5320 IP Phone VoIP phone 907.1520 6 0.20 90.7152 No Anna Andreadi 5 2014 6
8 9 CA-2014-115812 2014-06-09 2014-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles ... DXL Angle-View Binders with Locking Rings by S... 18.5040 3 0.20 5.7825 No Anna Andreadi 5 2014 6
9 10 CA-2014-115812 2014-06-09 2014-06-14 Standard Class BH-11710 Brosina Hoffman Consumer United States Los Angeles ... Belkin F5C206VTEL 6 Outlet Surge 114.9000 5 0.00 34.4700 No Anna Andreadi 5 2014 6

10 rows × 26 columns

1. Descriptive Statistics¶

This section provides summary statistics for both numerical and categorical variables,

  • Provides basic statistics (mean, std, min, max) for numerical columns like Sales, Profit, etc.

  • Summarizes frequency distributions of categorical columns like Region, Category, and Segment.

Dataset Dimensions and Data Types¶

Here, I examine the structure of the dataset:

  • There are 9,994 entries and 26 variables.
  • The dataset includes both numerical (e.g., Sales, Profit, Quantity) and categorical variables (e.g., Category, Region, Segment).

Understanding data types and null entries is essential before proceeding with analysis.

Dataset Dimensions (Rows, Columns): (9994, 26)
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 26 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Row ID             9994 non-null   int64  
 1   Order ID           9994 non-null   object 
 2   Order Date         9994 non-null   object 
 3   Ship Date          9994 non-null   object 
 4   Ship Mode          9994 non-null   object 
 5   Customer ID        9994 non-null   object 
 6   Customer Name      9994 non-null   object 
 7   Segment            9994 non-null   object 
 8   Country            9994 non-null   object 
 9   City               9994 non-null   object 
 10  State              9994 non-null   object 
 11  Postal Code        9994 non-null   int64  
 12  Region             9994 non-null   object 
 13  Product ID         9994 non-null   object 
 14  Category           9994 non-null   object 
 15  Sub-Category       9994 non-null   object 
 16  Product Name       9994 non-null   object 
 17  Sales              9994 non-null   float64
 18  Quantity           9994 non-null   int64  
 19  Discount           9994 non-null   float64
 20  Profit             9994 non-null   float64
 21  Returned           9994 non-null   object 
 22  Person             9994 non-null   object 
 23  Shipping Duration  9994 non-null   int64  
 24  Order Year         9994 non-null   int64  
 25  Order Month        9994 non-null   int64  
dtypes: float64(3), int64(6), object(17)
memory usage: 2.0+ MB

Summary Statistics: Numerical Variables¶

This summary provides a snapshot of key distribution characteristics.

Summary statistics for numerical variables:
Row ID Postal Code Sales Quantity Discount Profit Shipping Duration Order Year Order Month
count 9994.000000 9994.000000 9994.000000 9994.000000 9994.000000 9994.000000 9994.000000 9994.000000 9994.000000
mean 4997.500000 55190.379428 229.858001 3.789574 0.156203 28.656896 3.958175 2015.722233 7.809686
std 2885.163629 32063.693350 623.245101 2.225110 0.206452 234.260108 1.747567 1.123555 3.284654
min 1.000000 1040.000000 0.444000 1.000000 0.000000 -6599.978000 0.000000 2014.000000 1.000000
25% 2499.250000 23223.000000 17.280000 2.000000 0.000000 1.728750 3.000000 2015.000000 5.000000
50% 4997.500000 56430.500000 54.490000 3.000000 0.200000 8.666500 4.000000 2016.000000 9.000000
75% 7495.750000 90008.000000 209.940000 5.000000 0.200000 29.364000 5.000000 2017.000000 11.000000
max 9994.000000 99301.000000 22638.480000 14.000000 0.800000 8399.976000 7.000000 2017.000000 12.000000

Interpretation of Numerical Summary Statistics:¶

  • Sales: Average sales are around $229.86, but with a very high standard deviation $623.25) and a maximum of $22,638.48. This indicates a wide range of sales values and likely the presence of outliers or high-value transactions. The median (\$54.49) is much lower than the mean, suggesting a right-skewed distribution where a few large sales pull the average up.
  • Quantity: The average quantity per order is about 3.8, with a relatively small standard deviation, indicating most orders involve a few items. The range is from 1 to 14.
  • Discount: Discount percentages are applied across a broad spectrum, from 0% to 80%. A notable observation is that the median and 75th percentile are both 20%, indicating that a 20% discount is a very common promotional strategy. The presence of 0% discounts suggests many items are sold at full price. This variable is crucial for understanding its impact on profitability.
  • Profit: The average profit is \$28.66, but the standard deviation is high (\$234.26), and the minimum profit is -\$6,599.98. This indicates significant losses on some transactions. The median profit (\$8.67) is much lower than the mean, similar to sales, suggesting that a few highly profitable sales are skewing the average, and a deeper dive into loss-making transactions is warranted.
  • Shipping Duration: On average, shipping takes about 4 days, with a range from 0 to 7 days. This metric can be important for customer satisfaction and logistical efficiency.
  • Order Year/Month: These show the range of years and months present in the dataset, indicating the temporal scope.

Summary Statistics: Categorical Variables¶

This summary provides insights into the distribution and most frequent categories for the object (categorical) variables in the dataset.

Summary statistics for categorical variables:
Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City State Region Product ID Category Sub-Category Product Name Returned Person
count 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994 9994
unique 5009 1237 1334 4 793 793 3 1 531 49 4 1862 3 17 1850 2 4
top CA-2017-100111 2016-09-05 2015-12-16 Standard Class WB-21850 William Brown Consumer United States New York City California West OFF-PA-10001970 Office Supplies Binders Staple envelope No Anna Andreadi
freq 14 38 35 5968 37 37 5191 9994 915 2001 3203 19 6026 1523 48 9194 3203

Interpretation of Categorical Summary Statistics:

  • Ship Mode: 'Standard Class' is by far the most common shipping method, accounting for nearly 60% of all orders (5968 entries). This suggests a preference for cost-effective shipping or that faster options are less frequently utilized.
  • Segment: The 'Consumer' segment represents the largest customer base, making up over half of the orders (5191 entries). This indicates that individual consumers are the primary drivers of sales, followed by Corporate and Home Office segments.
  • Country: The dataset is entirely focused on the 'United States', with all 9994 entries originating from this country. This confirms the geographical scope of the Superstore operations captured in this data.
  • Region: The 'West' region has the highest number of orders (3203 entries), indicating it is the most active sales region, followed by East, Central, and South.
  • Category: 'Office Supplies' is the dominant product category, accounting for over 60% of all transactions (6026 entries). This highlights its central role in the Superstore's product offerings, followed by 'Furniture' and 'Technology'.
  • Sub-Category: Within 'Office Supplies', 'Binders' is the most frequently purchased sub-category (1523 entries), suggesting high demand for these items.
  • Returned: The vast majority of orders are 'No' (9194 entries), indicating a very low return rate for products. This suggests high customer satisfaction or effective product quality control.
  • Person: This seems to relate to the sales representative or person responsible for a region, with 'Anna Andreadi' appearing most frequently.

Key Categorical Distributions¶

  • Understanding the distribution of key categorical variables provides crucial insights into the Superstore's operational landscape and customer base.

  • Shows proportions (in %) of entries per Region, Category, and Segment.

Distribution of 'Region':
West       3203
East       2848
Central    2323
South      1620
Name: Region, dtype: int64

Normalized Distribution of 'Region':
West       32.05%
East        28.5%
Central    23.24%
South      16.21%
Name: Region, dtype: object
Distribution of 'Category':
Office Supplies    6026
Furniture          2121
Technology         1847
Name: Category, dtype: int64

Normalized Distribution of 'Category':
Office Supplies     60.3%
Furniture          21.22%
Technology         18.48%
Name: Category, dtype: object
Distribution of 'Segment':
Consumer       5191
Corporate      3020
Home Office    1783
Name: Segment, dtype: int64

Normalized Distribution of 'Segment':
Consumer       51.94%
Corporate      30.22%
Home Office    17.84%
Name: Segment, dtype: object

Interpretation of Key Categorical Distributions:¶

  • Region Distribution: The dataset shows a clear geographical imbalance. The 'West' region leads in terms of order volume (3,203 entries), closely followed by 'East' (2,848 entries). 'Central' (2,323 entries) and 'South' (1,620 entries) have fewer transactions. This distribution is further reflected in total sales, where the 'West' (\$725,457.82) and 'East' (\$678,781.24) regions generate the highest revenues, indicating they are the primary sales drivers for the Superstore.
  • Category Distribution: 'Office Supplies' is the most dominant product category, accounting for approximately 60.3% of all transactions (6,026 entries). This highlights its central role in the Superstore's inventory and customer purchases. 'Furniture' (21.2%) and 'Technology' (18.5%) make up the remaining significant portions, suggesting a diverse product offering but with a strong emphasis on office-related items.
  • Segment Distribution: The 'Consumer' segment represents the largest customer base, comprising about 51.9% of all orders (5,191 entries). This indicates that individual customers are the primary focus of the Superstore's sales efforts. The 'Corporate' segment accounts for 30.2% (3,020 entries), and 'Home Office' for 17.8% (1,783 entries), showing a substantial presence in business-to-business and small office markets as well.

2. Exploratory Data Analysis (EDA)¶

This section focuses on in-depth Exploratory Data Analysis (EDA) through various visualizations.

Monthly Sales Trend Analysis¶

This line chart displays seasonal sales patterns throughout the year, revealing distinct peaks and valleys in business performance.

Key Insights:¶

  • Strong year-end performance with November hitting peak sales at 350k, followed by December at 325k
  • September shows significant growth at 310k, indicating back-to-school or Q3 momentum
  • February represents the lowest point at approximately 60k, creating a dramatic seasonal dip
  • Mid-year months (May-July) maintain steady performance around 150k with minimal fluctuation
Total Sales by Month:
   Order Month        Sales
0      January   94924.8356
1     February   59751.2514
2        March  205005.4888
3        April  137762.1286
4          May  155028.8117
5         June  152718.6793
6         July  147238.0970
7       August  159044.0630
8    September  307649.9457
9      October  200322.9847
10    November  352461.0710
11    December  325293.5035

Recommendation:¶

  • Capitalize on Q4 momentum by ensuring adequate inventory and staffing for November-December surge
  • Develop targeted campaigns to boost February sales and minimize seasonal impact
  • Investigate September success factors to replicate growth strategies in other months
  • Implement consistent marketing efforts during stable mid-year period to drive incremental growth

Correlation Matrix of Numerical Variables¶

This correlation heatmap displays the relationships between key business metrics: Sales, Quantity, Discount, Profit, and Shipping Duration, with correlation coefficients ranging from -1 to +1. Key Observations:

  • Sales and Profit show a moderate positive correlation (0.48), indicating that higher sales generally lead to higher profits, but the relationship isn't perfectly linear.
  • Discount and Profit have a notable negative correlation (-0.22), confirming that higher discounts significantly reduce profitability.
  • Sales and Quantity show a weak positive correlation (0.20), suggesting that sales value isn't strongly driven by quantity alone.
  • Shipping Duration shows minimal correlation with other variables (all ≤ 0.02), indicating that delivery time doesn't significantly impact sales, profit, or discount patterns.

Recommendation:¶

  • Focus on strategies that maximize the Sales-Profit relationship by identifying high-value, high-margin products rather than relying solely on volume increases.
  • Develop sophisticated pricing models that balance discount impact with sales volume, given the clear negative correlation between discounting and profitability.

Association Between Ship Mode and Returns¶

This heatmap displays the contingency table showing the relationship between shipping methods and return status, with statistical significance confirmed by a Chi-squared test (χ² = 22.95, p < 0.001).

Key Observations:¶

  • Same Day shipping has the highest return rate at 11.79% (64 returns out of 543 orders), which is concerning given its premium cost.
  • First Class shipping shows a surprisingly high return rate of 9.88% (152 returns out of 1,538 orders), exceeding Standard Class returns.
  • Standard Class, despite having the highest volume (5,968 orders), maintains a moderate return rate of 7.54% (450 returns).
  • Second Class performs best with the lowest return rate at 6.89% (134 returns out of 1,945 orders).
  • There's a counterintuitive pattern where more expensive shipping methods correlate with higher return rates.
Contingency Table:
                   No  Yes
First Class     1386  152
Same Day         479   64
Second Class    1811  134
Standard Class  5518  450

Chi-squared Statistic: 22.95
P-value: 0.000
Degrees of Freedom: 3

Recommendation:¶

  • Investigate root causes of high return rates for premium shipping services, particularly Same Day delivery.
  • Implement enhanced quality control processes for expedited shipments to meet higher customer expectations.
  • Consider adjusting pricing strategies to account for the additional costs associated with higher return rates in premium shipping.
  • Develop targeted customer communication strategies to set appropriate expectations for different shipping tiers.

2. Statistical Hypothesis Testing¶

A. ANOVA Test: Profit by Category¶

This ANOVA table shows the statistical analysis of profit differences across product categories (Furniture, Office Supplies, Technology) using one-way ANOVA.

Key Observations:¶

  • The F-statistic is 54.31 with a highly significant p-value of 3.47 × 10⁻²⁴ (p < 0.001), indicating strong statistical evidence of profit differences between categories.
  • With 2 degrees of freedom for categories and 9,991 degrees of freedom for residuals, the test has substantial statistical power.
  • The sum of squares for categories (5.89 × 10⁶) compared to residual sum of squares (5.42 × 10⁸) suggests that category explains a meaningful portion of profit variation.
  • The highly significant result confirms that product category is a strong predictor of profitability performance.
Hypothesis: Is there a significant difference in profit across different product categories?

- Null Hypothesis  : The mean profit is the same across all product categories.
 - Alternative Hypothesis  : At least one product category has a different mean profit.

ANOVA Table for Profit by Category:
sum_sq df F PR(>F)
C(Category) 5.898009e+06 2.0 54.311023 3.469918e-24
Residual 5.424958e+08 9991.0 NaN NaN

Recommendation:¶

  • Focus strategic attention on the lowest-performing category to improve profit margins through pricing optimization or cost reduction.
  • Leverage insights from the highest-performing category to develop best practices for other product lines.
  • Consider category-specific business strategies rather than uniform approaches across all product categories.

B. T-test: Discounted vs. Non-discounted Profit¶

This independent samples t-test compares profit levels between discounted items (discount > 0%) and non-discounted items (discount = 0%) to determine if discounting significantly impacts profitability. Key Observations:

  • The t-statistic is -15.74, indicating that discounted items have significantly lower profits than non-discounted items.
  • The p-value of 4.36 × 10⁻⁵⁵ is extremely significant (p < 0.001), providing strong statistical evidence that discounting negatively impacts profit.
  • The negative t-statistic confirms that the mean profit for discounted items is substantially lower than for non-discounted items.
  • This represents a highly significant relationship with practical business implications for pricing strategy.
Hypothesis: Is there a significant difference in profit between items sold with a discount and items sold without a discount?

 - Null Hypothesis : The mean profit for discounted items is equal to the mean profit for non-discounted items.
 - Alternative Hypothesis : The mean profit for discounted items is different from the mean profit for non-discounted items.

T-test: Profit difference by Discount status
T-statistic: -15.737992941015493
P-value: 4.356930371141414e-55

Recommendation¶

  • Implement stricter discount controls by analyzing profit thresholds to establish discount caps and approval processes that prevent excessive discounting.
  • Shift toward value-based pricing strategies and evaluate whether current discounting practices generate sufficient volume increases to justify the significant profit reduction.

Chi-Square Test: Region vs. Returns¶

This contingency table and chi-square test analyze the relationship between geographic regions and return rates, with statistical significance confirmed by χ² = 343.57 and p < 0.001.

Key Insights:¶

  • West region has an exceptionally high return rate of 15.31% (490 returns out of 3,203 orders), significantly higher than all other regions.
  • Central region performs best with the lowest return rate at 3.96% (92 returns out of 2,323 orders).
  • East region shows a moderate return rate of 5.23% (149 returns out of 2,848 orders).
  • South region maintains a relatively low return rate of 4.26% (69 returns out of 1,620 orders).
  • The West region's return rate is nearly 4 times higher than the Central region, indicating significant regional disparities in customer satisfaction or operational issues.
Contingency Table (Region vs. Returned):
Returned    No  Yes
Region             
Central   2231   92
East      2699  149
South     1551   69
West      2713  490

Chi-square statistic: 343.5657
P-value for Region vs. Returned: 0.0000

Conclusion: Since the p-value (0.0000) is less than 0.05, we reject the null hypothesis.

There is a statistically significant association between 'Region' and whether an item is 'Returned'.

This suggests that return rates may vary by region, warranting further investigation into regional customer satisfaction or product quality issues.

Recommendation:¶

  • Conduct immediate investigation into West region operations, including supplier quality, fulfillment processes, and customer service practices to identify root causes of high return rates.
  • Implement best practices from Central and South regions (low return rates) across all locations, and consider region-specific quality control measures to address the significant geographic variation in returns.

Impact of Discount on Profit Across Categories¶

This scatter plot illustrates the relationship between discount rate and profit, with points colored by product Category (Furniture, Office Supplies, Technology).

Key Observations:¶

  • Higher discounts generally correlate with lower profits, particularly for Office Supplies and Technology.

  • Many points with discounts above 40% fall into the negative profit range, indicating possible over-discounting.

  • Some Technology items show high profits at 0% discount, emphasizing that discounts may not be necessary for profitable sales in this category.

  • Furniture items seem more stable but also tend to lose profitability with increasing discounts.

Recommendation:¶

  • Re-evaluate the discounting strategy, especially for high-loss categories like Office Supplies.

  • Consider implementing a discount cap policy to avoid over-discounting.

  • Use targeted discounting rather than blanket discounts to preserve profitability while still driving sales.

Sales vs Profit by Sub-Category¶

These bar charts show total sales and profit by product sub-category.

Key Observation:¶

  • Phones and Chairs have the highest sales, but Chairs have lower profit contribution.

  • Copiers generate the highest profit, even with moderate sales.

  • Tables, Bookcases, and Supplies generate losses, despite decent sales volumes.

  • Accessories and Binders show healthy profit-to-sales alignment.

Recommendation:¶

  • Boost sales of high-profit items like Copiers and Accessories.

  • Reassess or reduce focus on Tables, Bookcases, and Supplies due to consistent losses.

  • Monitor Chairs and Appliances profitability—optimize pricing or reduce costs.

  • Align marketing and discounting strategy to maximize profitability, not just sales.

Profit Analysis by Customer Segment¶

Insights from the Boxplot:

  • All three segments show a similar spread and central tendency of profit values.

  • There are outliers in each group (e.g., high profits and losses), but the overall shapes of the distributions are comparable.

  • This visually supports the ANOVA finding: segment-based targeting may not drive significantly different profits.

--- ANOVA Results: Profit by Segment ---
F-statistic: 0.90
P-value: 4.074e-01

Interpretation of ANOVA:¶

  • The p-value is greater than 0.05, indicating no statistically significant difference in the average profit across the three customer segments: Consumer, Corporate, and Home Office.
  • This suggests that the customer segment is not a strong predictor of profitability

Recommendation:¶

  • Focus less on customer segmentation as a profit strategy. Instead, analyze other factors such as product sub-categories, regions, or discount levels, which may have a stronger impact on profitability.

Sales Performance by Shipping Mode¶

This bar chart shows total sales distribution across four shipping modes, with Standard Class significantly dominating the sales volume.

Key insights:¶

  • Standard Class generates approximately 1.4M in sales, representing the majority of total revenue
  • Second Class follows with around 0.46M, while First Class contributes approximately 0.36M
  • Same Day shipping has the lowest sales volume at roughly 0.13M

Recommendation:¶

  • Leverage Standard Class popularity by ensuring adequate inventory and capacity for this shipping option
  • Investigate opportunities to upsell customers from Standard to higher-margin express shipping options
  • Consider promotional strategies to boost Same Day and First Class usage if they offer better profit margins
  • Analyze customer preferences to optimize shipping portfolio and pricing strategy

Product Quantity Sales by Customer Segment¶

This bar chart displays the total quantity of products sold across three customer segments, revealing clear volume differences between segments.

Key Insights:¶

  • Consumer segment dominates with approximately 20k units sold, nearly double the Corporate segment
  • Corporate segment follows with around 12k units, representing mid-level volume
  • Home Office segment shows the lowest sales quantity at roughly 7k units
Total Quantity by Segment:
       Segment  Quantity
0     Consumer     19521
1    Corporate     11608
2  Home Office      6744

Recommendation:¶

  • Focus marketing efforts on Consumer segment given its high volume potential and current performance
  • Develop targeted strategies to increase Home Office segment penetration through specialized product bundles
  • Analyze Corporate segment needs to identify growth opportunities and optimize B2B sales approaches
  • Consider segment-specific inventory management to align stock levels with demand patterns

Profit Distribution by Product Category¶

This pie chart shows the profit contribution of three product categories, with Technology leading as the primary profit driver.

Key Insights:¶

  • Technology dominates profit generation at 50.8%, contributing over half of total profits
  • Office Supplies accounts for 42.8% of profits, representing a significant secondary revenue source
  • Furniture contributes only 6.4% to total profits, indicating underperformance relative to other categories
Total Profit by Category:
          Category       Profit
0        Furniture   18451.2728
1  Office Supplies  122490.8008
2       Technology  145454.9481

Recommendation:¶

  • Continue investing in Technology category given its strong profit performance and market leadership
  • Optimize Office Supplies operations to maintain its substantial profit contribution
  • Investigate Furniture category challenges and develop strategies to improve profitability through pricing, cost reduction, or product mix optimization
  • Consider reallocating resources from low-performing Furniture to high-margin Technology products

Sales Performance by Geographic Region¶

This bar chart reveals regional sales distribution across four geographic areas, showing significant variation in market performance.

Key Insights:¶

  • West region leads with approximately 720k in sales, establishing it as the top-performing market
  • East region follows closely with around 680k, representing strong secondary performance
  • Central region generates moderate sales at roughly 500k, showing decent market penetration
  • South region underperforms significantly at about 390k, indicating potential growth opportunities
Total Sales by Region:
    Region        Sales
3     West  725457.8245
1     East  678781.2400
0  Central  501239.8908
2    South  391721.9050

Recommendation:¶

  • Maintain and expand successful strategies from West and East regions to sustain market leadership
  • Investigate South region challenges and implement targeted initiatives to boost performance
  • Analyze Central region potential for optimization and growth acceleration
  • Consider reallocating resources or marketing efforts to underperforming regions while protecting strong markets